1

Use sex and region to create a count plot to see what combination of sex and region has the highest number of observations. In other words, where do we observe the highest number of observations and for which gender?

library(ggplot2)

ggplot(df) + geom_count(aes(x=region , y = sex), color ="blue") 

The highest number of observation belongs to southeast and especially for male in southeast.

2

Does the link between bmi and charges vary based on smoker?

  1. Create a point plot between the bmi and charges and color them based on smoker.

  2. Next add 2 linear trend lines to the plot using geom_smooth and make sure to color them based on smoker.

  3. Change the axes names to Body Mass Index and Dollar Charged.

  4. Add the title of the graph to read The link between BMI and Charges: Smokers and non-smokers.

  5. What do you conclude based on this graph?

ggplot(df, aes(x = bmi , y = charges , color = smoker)) +
        geom_point() +
        geom_smooth( method =lm) +
        labs(title = "The link between BMI and Charges: Smokers and non-smokers")+
        xlab("Body Mass Index") +
        ylab("Dollar Charged")

This plot shows that BMI plays a bigger role for smokers and charges increase significantly as the BMI of smokers increases. However, for non smokers the increase in charges is much less for a higher BMIs.

3

Create a stacked bar chart for each region and fill the bars based on smoker. Next flip the coordinates (axes) and also change the theme of the graph to minimal. Add the count of the observations to each bar and adjust them to see each number in the correct position (note that we should see 8 numbers in the graph!). Next, instead of letting R choose the colors for the bars, use the function scale_fill_brewer() and a palette called “Dark2” to change the color of the bars.


ggplot(df) +
  aes(x = region, fill = smoker) +
  geom_bar() +
   scale_fill_brewer(palette = "Dark2")+
  coord_flip() +
  theme_minimal() +
  geom_text(stat = "count" ,
            aes(x = region , label = stat(count)), hjust =  1.5)

4

Using the Plotly package in R, create an interactive chart that shows the scatter plot between bmi (in the horizontal axis), charges (in the vertical axis), and color the dots based on regions.

# message=TRUE, warning=TRUE
library(plotly)

 plot_ly(
  df,
  y = ~ charges,
  x = ~ bmi,
  
  color = ~ region
      )
NA
LS0tDQp0aXRsZTogIkFzc2lnbmVudCAyIg0KYXV0aG9yOiAiU3VnZ2VzdGVkIGFuc3dlcnMiDQpvdXRwdXQ6DQogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQNCiAgcGRmX2RvY3VtZW50OiBkZWZhdWx0DQotLS0NCg0KIyAxDQoNClVzZSAqKnNleCoqIGFuZCAqKnJlZ2lvbioqIHRvIGNyZWF0ZSBhICoqY291bnQgcGxvdCoqIHRvIHNlZSB3aGF0IGNvbWJpbmF0aW9uIG9mIHNleCBhbmQgcmVnaW9uIGhhcyB0aGUgaGlnaGVzdCBudW1iZXIgb2Ygb2JzZXJ2YXRpb25zLiBJbiBvdGhlciB3b3Jkcywgd2hlcmUgZG8gd2Ugb2JzZXJ2ZSB0aGUgaGlnaGVzdCBudW1iZXIgb2Ygb2JzZXJ2YXRpb25zIGFuZCBmb3Igd2hpY2ggZ2VuZGVyPw0KDQpgYGB7cn0NCmxpYnJhcnkoZ2dwbG90MikNCg0KZ2dwbG90KGRmKSArIGdlb21fY291bnQoYWVzKHg9cmVnaW9uICwgeSA9IHNleCksIGNvbG9yID0iYmx1ZSIpIA0KYGBgDQoNClRoZSBoaWdoZXN0IG51bWJlciBvZiBvYnNlcnZhdGlvbiBiZWxvbmdzIHRvIHNvdXRoZWFzdCBhbmQgZXNwZWNpYWxseSBmb3IgbWFsZSBpbiBzb3V0aGVhc3QuDQoNCiMgMg0KDQpEb2VzIHRoZSBsaW5rIGJldHdlZW4gKipibWkqKiBhbmQgKipjaGFyZ2VzKiogdmFyeSBiYXNlZCBvbiAqKnNtb2tlcioqPw0KDQoxLiAgQ3JlYXRlIGEgcG9pbnQgcGxvdCBiZXR3ZWVuIHRoZSAqKmJtaSoqIGFuZCAqKmNoYXJnZXMqKiBhbmQgY29sb3IgdGhlbSBiYXNlZCBvbiAqKnNtb2tlcioqLg0KDQoyLiAgTmV4dCBhZGQgMiBsaW5lYXIgdHJlbmQgbGluZXMgdG8gdGhlIHBsb3QgdXNpbmcgZ2VvbV9zbW9vdGggYW5kIG1ha2Ugc3VyZSB0byBjb2xvciB0aGVtIGJhc2VkIG9uICoqc21va2VyKiouDQoNCjMuICBDaGFuZ2UgdGhlIGF4ZXMgbmFtZXMgdG8gKipCb2R5IE1hc3MgSW5kZXgqKiBhbmQgKipEb2xsYXIgQ2hhcmdlZCoqLg0KDQo0LiAgQWRkIHRoZSB0aXRsZSBvZiB0aGUgZ3JhcGggdG8gcmVhZCAqKlRoZSBsaW5rIGJldHdlZW4gQk1JIGFuZCBDaGFyZ2VzOiBTbW9rZXJzIGFuZCBub24tc21va2VycyoqLg0KDQo1LiAgV2hhdCBkbyB5b3UgY29uY2x1ZGUgYmFzZWQgb24gdGhpcyBncmFwaD8NCg0KYGBge3J9DQpnZ3Bsb3QoZGYsIGFlcyh4ID0gYm1pICwgeSA9IGNoYXJnZXMgLCBjb2xvciA9IHNtb2tlcikpICsNCiAgICAgICAgZ2VvbV9wb2ludCgpICsNCiAgICAgICAgZ2VvbV9zbW9vdGgoIG1ldGhvZCA9bG0pICsNCiAgICAgICAgbGFicyh0aXRsZSA9ICJUaGUgbGluayBiZXR3ZWVuIEJNSSBhbmQgQ2hhcmdlczogU21va2VycyBhbmQgbm9uLXNtb2tlcnMiKSsNCiAgICAgICAgeGxhYigiQm9keSBNYXNzIEluZGV4IikgKw0KICAgICAgICB5bGFiKCJEb2xsYXIgQ2hhcmdlZCIpDQpgYGANCg0KVGhpcyBwbG90IHNob3dzIHRoYXQgQk1JIHBsYXlzIGEgYmlnZ2VyIHJvbGUgZm9yIHNtb2tlcnMgYW5kIGNoYXJnZXMgaW5jcmVhc2Ugc2lnbmlmaWNhbnRseSBhcyB0aGUgQk1JIG9mIHNtb2tlcnMgaW5jcmVhc2VzLiBIb3dldmVyLCBmb3Igbm9uIHNtb2tlcnMgdGhlIGluY3JlYXNlIGluIGNoYXJnZXMgaXMgbXVjaCBsZXNzIGZvciBhIGhpZ2hlciBCTUlzLg0KDQojIDMNCg0KQ3JlYXRlIGEgc3RhY2tlZCBiYXIgY2hhcnQgZm9yIGVhY2ggcmVnaW9uIGFuZCBmaWxsIHRoZSBiYXJzIGJhc2VkIG9uIHNtb2tlci4gTmV4dCBmbGlwIHRoZSBjb29yZGluYXRlcyAoYXhlcykgYW5kIGFsc28gY2hhbmdlIHRoZSB0aGVtZSBvZiB0aGUgZ3JhcGggdG8gbWluaW1hbC4gQWRkIHRoZSBjb3VudCBvZiB0aGUgb2JzZXJ2YXRpb25zIHRvIGVhY2ggYmFyIGFuZCBhZGp1c3QgdGhlbSB0byBzZWUgZWFjaCBudW1iZXIgaW4gdGhlIGNvcnJlY3QgcG9zaXRpb24gKG5vdGUgdGhhdCB3ZSBzaG91bGQgc2VlIDggbnVtYmVycyBpbiB0aGUgZ3JhcGghKS4gTmV4dCwgaW5zdGVhZCBvZiBsZXR0aW5nIFIgY2hvb3NlIHRoZSBjb2xvcnMgZm9yIHRoZSBiYXJzLCB1c2UgdGhlIGZ1bmN0aW9uIHNjYWxlX2ZpbGxfYnJld2VyKCkgYW5kIGEgcGFsZXR0ZSBjYWxsZWQgIkRhcmsyIiB0byBjaGFuZ2UgdGhlIGNvbG9yIG9mIHRoZSBiYXJzLg0KDQpgYGB7cn0NCg0KZ2dwbG90KGRmKSArDQogIGFlcyh4ID0gcmVnaW9uLCBmaWxsID0gc21va2VyKSArDQogIGdlb21fYmFyKCkgKw0KICAgc2NhbGVfZmlsbF9icmV3ZXIocGFsZXR0ZSA9ICJEYXJrMiIpKw0KICBjb29yZF9mbGlwKCkgKw0KICB0aGVtZV9taW5pbWFsKCkgKw0KICBnZW9tX3RleHQoc3RhdCA9ICJjb3VudCIgLA0KICAgICAgICAgICAgYWVzKHggPSByZWdpb24gLCBsYWJlbCA9IHN0YXQoY291bnQpKSwgaGp1c3QgPSAgMS41KQ0KYGBgDQoNCiMgNA0KDQpVc2luZyB0aGUgUGxvdGx5IHBhY2thZ2UgaW4gUiwgY3JlYXRlIGFuIGludGVyYWN0aXZlIGNoYXJ0IHRoYXQgc2hvd3MgdGhlIHNjYXR0ZXIgcGxvdCBiZXR3ZWVuIGJtaSAoaW4gdGhlIGhvcml6b250YWwgYXhpcyksIGNoYXJnZXMgKGluIHRoZSB2ZXJ0aWNhbCBheGlzKSwgYW5kIGNvbG9yIHRoZSBkb3RzIGJhc2VkIG9uIHJlZ2lvbnMuDQoNCmBgYHtyIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQojIG1lc3NhZ2U9VFJVRSwgd2FybmluZz1UUlVFDQpsaWJyYXJ5KHBsb3RseSkNCg0KIHBsb3RfbHkoDQogIGRmLA0KICB5ID0gfiBjaGFyZ2VzLA0KICB4ID0gfiBibWksDQogIA0KICBjb2xvciA9IH4gcmVnaW9uDQogICAgICApDQogDQpgYGANCg==